

<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  
  <title>CephFS 健康消息 &mdash; Ceph Documentation</title>
  

  
  <link rel="stylesheet" href="../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/graphviz.css" type="text/css" />
  <link rel="stylesheet" href="../../_static/css/custom.css" type="text/css" />

  
  

  
  

  

  
  <!--[if lt IE 9]>
    <script src="../../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
    
      <script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
        <script src="../../_static/jquery.js"></script>
        <script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
        <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
        <script src="../../_static/doctools.js"></script>
        <script src="../../_static/sphinx_highlight.js"></script>
    
    <script type="text/javascript" src="../../_static/js/theme.js"></script>

    
    <link rel="index" title="Index" href="../../genindex/" />
    <link rel="search" title="Search" href="../../search/" />
    <link rel="next" title="升级 MDS 集群" href="../upgrading/" />
    <link rel="prev" title="CephFS 配额管理" href="../quota/" /> 
</head>

<body class="wy-body-for-nav">

   
  <header class="top-bar">
    <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="../../" class="icon icon-home" aria-label="Home"></a></li>
          <li class="breadcrumb-item"><a href="../">Ceph 文件系统</a></li>
      <li class="breadcrumb-item active">CephFS 健康消息</li>
      <li class="wy-breadcrumbs-aside">
            <a href="../../_sources/cephfs/health-messages.rst.txt" rel="nofollow"> View page source</a>
      </li>
  </ul>
  <hr/>
</div>
  </header>
  <div class="wy-grid-for-nav">
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search"  style="background: #eee" >
          

          
            <a href="../../" class="icon icon-home"> Ceph
          

          
          </a>

          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../../search/" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../start/">Ceph 简介</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../install/">安装 Ceph</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../cephadm/">Cephadm</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../rados/">Ceph 存储集群</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../">Ceph 文件系统</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../#cephfs">CephFS 入门</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../#id4">管理</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../createfs/"> 创建 CephFS 文件系统</a></li>
<li class="toctree-l3"><a class="reference internal" href="../administration/"> 管理命令</a></li>
<li class="toctree-l3"><a class="reference internal" href="../multifs/"> 创建多个文件系统</a></li>
<li class="toctree-l3"><a class="reference internal" href="../add-remove-mds/"> 配备、增加、删除 MDS</a></li>
<li class="toctree-l3"><a class="reference internal" href="../standby/">术语</a></li>
<li class="toctree-l3"><a class="reference internal" href="../standby/#mds">MDS 守护进程的引用</a></li>
<li class="toctree-l3"><a class="reference internal" href="../standby/#id3">故障切换的管理</a></li>
<li class="toctree-l3"><a class="reference internal" href="../standby/#standby-replay">热备（ standby-replay ）的配置</a></li>
<li class="toctree-l3"><a class="reference internal" href="../standby/#mds-join-fs">配置 MDS 与文件系统的亲和性</a></li>
<li class="toctree-l3"><a class="reference internal" href="../cache-configuration/"> MDS 缓存配置</a></li>
<li class="toctree-l3"><a class="reference internal" href="../mds-config-ref/"> MDS 配置选项</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../man/8/ceph-mds/"> ceph-mds 手册页</a></li>
<li class="toctree-l3"><a class="reference internal" href="../nfs/"> 通过 NFS 导出</a></li>
<li class="toctree-l3"><a class="reference internal" href="../app-best-practices/"> 应用最佳实践</a></li>
<li class="toctree-l3"><a class="reference internal" href="../fs-volumes/"> FS 卷和子卷</a></li>
<li class="toctree-l3"><a class="reference internal" href="../quota/"> CephFS 配额管理</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#"> 健康消息</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#id1">集群健康检查</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id2">守护进程报告的健康检查</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../upgrading/">升级 MDS 集群</a></li>
<li class="toctree-l3"><a class="reference internal" href="../upgrading/#firefly-jewel">升级比 Firefly 老的文件系统，需过 Jewel 这个槛</a></li>
<li class="toctree-l3"><a class="reference internal" href="../cephfs-top/"> CephFS Top 工具</a></li>
<li class="toctree-l3"><a class="reference internal" href="../snap-schedule/"> 定时快照</a></li>
<li class="toctree-l3"><a class="reference internal" href="../cephfs-mirroring/"> CephFS 快照镜像</a></li>
<li class="toctree-l3"><a class="reference internal" href="../purge-queue/"> 清理队列</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../#id5">挂载 CephFS</a></li>
<li class="toctree-l2"><a class="reference internal" href="../#id6">CephFS 内幕</a></li>
<li class="toctree-l2"><a class="reference internal" href="../#id7">故障排除和灾难恢复</a></li>
<li class="toctree-l2"><a class="reference internal" href="../#id9">更多细节</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../rbd/">Ceph 块设备</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../radosgw/">Ceph 对象网关</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../mgr/">Ceph 管理器守护进程</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../mgr/dashboard/">Ceph 仪表盘</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../monitoring/">监控概览</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../api/">API 文档</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../architecture/">体系结构</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../dev/developer_guide/">开发者指南</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../dev/internals/">Ceph 内幕</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../governance/">项目管理</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../foundation/">Ceph 基金会</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../ceph-volume/">ceph-volume</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../releases/general/">Ceph 版本（总目录）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../releases/">Ceph 版本（索引）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../security/">Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../hardware-monitoring/">硬件监控</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../glossary/">Ceph 术语</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../jaegertracing/">Tracing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../translation_cn/">中文版翻译资源</a></li>
</ul>

            
          
        </div>
        
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../../">Ceph</a>
        
      </nav>


      <div class="wy-nav-content">
        
        <div class="rst-content">
        
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
<div id="dev-warning" class="admonition note">
  <p class="first admonition-title">Notice</p>
  <p class="last">This document is for a development version of Ceph.</p>
</div>
  <div id="docubetter" align="right" style="padding: 5px; font-weight: bold;">
    <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
  </div>

  
  <section id="cephfs">
<span id="cephfs-health-messages"></span><h1>CephFS 健康消息<a class="headerlink" href="#cephfs" title="Permalink to this heading"></a></h1>
<section id="id1">
<h2>集群健康检查<a class="headerlink" href="#id1" title="Permalink to this heading"></a></h2>
<p>在文件系统映射图结构（以及封闭式的 MDS 映射图）变为特定状态时，
Ceph 监视器守护进程会产生健康消息。</p>
<dl class="field-list simple">
<dt class="field-odd">消息<span class="colon">:</span></dt>
<dd class="field-odd"><p>mds rank(s) <em>ranks</em> have failed</p>
</dd>
<dt class="field-even">描述<span class="colon">:</span></dt>
<dd class="field-even"><p>一或多个 MDS rank 没能分给守护进程，只有可用的替补守护进程启动后集群才能恢复运转。</p>
</dd>
</dl>
<hr class="docutils" />
<dl class="field-list simple">
<dt class="field-odd">消息<span class="colon">:</span></dt>
<dd class="field-odd"><p>mds rank(s) <em>ranks</em> are damaged</p>
</dd>
<dt class="field-even">描述<span class="colon">:</span></dt>
<dd class="field-even"><p>一或多个 MDS rank 遇到了损伤严重的元数据，只有修复这些数据它才能再次启动。</p>
</dd>
</dl>
<hr class="docutils" />
<dl class="field-list simple">
<dt class="field-odd">消息<span class="colon">:</span></dt>
<dd class="field-odd"><p>mds cluster is degraded</p>
</dd>
<dt class="field-even">描述<span class="colon">:</span></dt>
<dd class="field-even"><p>一或多个 MDS rank 现在的状态不是 up 且未在线运行，此问题解决前客户端只能暂停元数据操作。此情形涉及失效、损坏的 rank ，另外也包括已分到 MDS
但还没进入 <em>active</em> 状态的 rank （如处于 <em>replay</em> 状态的 rank ）。</p>
</dd>
</dl>
<hr class="docutils" />
<dl class="field-list simple">
<dt class="field-odd">消息<span class="colon">:</span></dt>
<dd class="field-odd"><p>mds <em>names</em> are laggy</p>
</dd>
<dt class="field-even">描述<span class="colon">:</span></dt>
<dd class="field-even"><p>这些 MDS 守护进程至少有 <code class="docutils literal notranslate"><span class="pre">mds_beacon_grace</span></code> 秒（默认为 15s ）没向监视器发送信标消息（ beacon message ）了，它们本来应该每 <code class="docutils literal notranslate"><span class="pre">mds_beacon_interval</span></code> 秒（默认为 4s ）发送一次的，它们可能崩溃了。
Ceph 监视器会自动用灾备替换掉滞后的守护进程。</p>
</dd>
</dl>
<hr class="docutils" />
<dl class="field-list simple">
<dt class="field-odd">消息<span class="colon">:</span></dt>
<dd class="field-odd"><p>insufficient standby daemons available</p>
</dd>
<dt class="field-even">描述<span class="colon">:</span></dt>
<dd class="field-even"><p>一或多个文件系统配置的是需要一定数量的灾备守护进程
（包括热备 standby-replay 守护进程），但是集群内却没有足够多的守护进程。
非重放的灾备进程可算进任意文件系统（即它们可重叠）。这个警告可用
<code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">fs</span> <span class="pre">set</span> <span class="pre">&lt;fs&gt;</span> <span class="pre">standby_count_wanted</span> <span class="pre">&lt;count&gt;</span></code> 来配置，
<code class="docutils literal notranslate"><span class="pre">count</span></code> 配置为 0 时禁用此功能。</p>
</dd>
</dl>
</section>
<section id="id2">
<h2>守护进程报告的健康检查<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h2>
<p>MDS 守护进程能定位各种各样不该出现的状况，并通过 <code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">status</span></code>
出示给操作员。这些状况附带了人类可读的消息，另外 JSON 格式的输出还有一个以 MDS_HEALTH 打头的唯一代码。</p>
<p><code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">health</span> <span class="pre">detail</span></code> 显示了当前情境的细节。
下面是一个集群遇到 MDS 相关的性能问题时的典型健康报告:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>ceph<span class="w"> </span>health<span class="w"> </span>detail
<span class="go">HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests</span>
<span class="go">MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs</span>
<span class="go">   mds.fs-01(mds.0): 3 slow metadata IOs are blocked &gt; 30 secs, oldest blocked for 51123 secs</span>
<span class="go">MDS_SLOW_REQUEST 1 MDSs report slow requests</span>
<span class="go">   mds.fs-01(mds.0): 5 slow requests are blocked &gt; 30 secs</span>
</pre></div>
</div>
<p>其中，例如， <code class="docutils literal notranslate"><span class="pre">MDS_SLOW_REQUEST</span></code> 是表达当前情境的唯一代码，
指出有一些请求花费很长时间才完成。之后的内容显示了此消息的级别和出现慢请求的是哪些 MDS 守护进程。</p>
<p>本页罗列了 MDS 守护进程抛出的健康检查消息。要看其它守护进程抛出的检查消息，
见 <a class="reference internal" href="../../rados/operations/health-checks/#health-checks"><span class="std std-ref">健康检查</span></a> 。</p>
<section id="mds-trim">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_TRIM</span></code><a class="headerlink" href="#mds-trim" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Behind on trimming…”
修剪操作落后了。</p>
</dd>
<dt>描述</dt><dd><p>CephFS 维护着的元数据日志是切成<em>日志片段（ log segment ）</em>的。
日志的长度（按片段数量算）是用 <code class="docutils literal notranslate"><span class="pre">mds_log_max_segments</span></code> 选项控制的，
当片段数量超过配置时， MDS 就开始写回元数据，以便删除（裁剪、 trim ）
最老的片段。如果回写得太慢，或者软件缺陷妨碍了裁剪，
这样的健康消息就可能出现。此消息出现的阈值是由配置选项
<code class="docutils literal notranslate"><span class="pre">mds_log_warn_factor</span></code> 控制的，默认是 2.0 。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-health-client-late-release-mds-health-client-late-release-many">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_HEALTH_CLIENT_LATE_RELEASE</span></code>, <code class="docutils literal notranslate"><span class="pre">MDS_HEALTH_CLIENT_LATE_RELEASE_MANY</span></code><a class="headerlink" href="#mds-health-client-late-release-mds-health-client-late-release-many" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Client <em>name</em> failing to respond to capability release”
名为 <em>name</em> 的客户端没有回应能力释放请求。</p>
</dd>
<dt>描述</dt><dd><p>CephFS 客户端收到了 MDS 发出的<em>能力（ capabilities ）</em> ，它就像锁。
有时候，比如一个客户端需要访问权， MDS 就会让别的客户端释放它们的能力，
如果有客户端没响应、或者有缺陷，它就有可能没及时释放、或者根本不释放。
如果某个客户端的响应时间超过了 <code class="docutils literal notranslate"><span class="pre">session_timeout</span></code> （默认为 60s ），
这条消息就会出现。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-client-recall-mds-health-client-recall-many">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_CLIENT_RECALL</span></code>, <code class="docutils literal notranslate"><span class="pre">MDS_HEALTH_CLIENT_RECALL_MANY</span></code><a class="headerlink" href="#mds-client-recall-mds-health-client-recall-many" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Client <em>name</em> failing to respond to cache pressure”
名为 <em>name</em> 的客户端没有回应缓存压力。</p>
</dd>
<dt>描述</dt><dd><p>客户端有各自的元数据缓存，客户端缓存中的条目（比如索引节点）
也会存在于 MDS 缓存中，所以当 MDS 需要削减其缓存时
（为了使之保持在 <code class="docutils literal notranslate"><span class="pre">mds_cache_memory_limit</span></code> 以下），
它也会发消息给客户端让它们削减各自的缓存。如果有客户端没响应或者有缺陷，
就会妨碍 MDS 将缓存保持在其限额以下， MDS 就有可能耗尽内存而后崩溃。
如果某个客户端在最后一个 <code class="docutils literal notranslate"><span class="pre">mds_recall_warning_decay_rate</span></code> 秒数内都没能释放到 <code class="docutils literal notranslate"><span class="pre">mds_recall_warning_threshold</span></code>
（以 <code class="docutils literal notranslate"><span class="pre">mds_recall_max_decay_rate</span></code> 为半衰期衰减）之下，这条消息就会出现。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-client-oldest-tid-mds-client-oldest-tid-many">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_CLIENT_OLDEST_TID</span></code>, <code class="docutils literal notranslate"><span class="pre">MDS_CLIENT_OLDEST_TID_MANY</span></code><a class="headerlink" href="#mds-client-oldest-tid-mds-client-oldest-tid-many" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Client <em>name</em> failing to advance its oldest client/flush tid”
名为 <em>name</em> 的客户端没能推进它最老的客户端、回刷 tid 。</p>
</dd>
<dt>描述</dt><dd><p>CephFS 的客户端-MDS 协议有一个名为 <em>oldest tid</em> 的字段，可让客户端通知 MDS 哪些请求全部完成了，这样的话它就可以被 MDS 遗忘。
如果一个有缺陷的客户端未能上报这个字段，那么与之相关的 MDS 就不能擅自清理这些请求所占用的资源。如果某个客户端的请求在 MDS 端已完成、但尚未收到客户端上报的
<em>oldest tid</em> 值，这样的请求数量超过 <code class="docutils literal notranslate"><span class="pre">max_completed_requests</span></code>
（默认为 100000 ）时，此消息就会出现。
MDS 用于修剪已完成的客户端请求（或刷回）的最后一个 tid ，
会包含在 <cite>session ls</cite> （或 <cite>client ls</cite> ）命令里，作为调试的辅助信息。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-damage">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_DAMAGE</span></code><a class="headerlink" href="#mds-damage" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Metadata damage detected”
探测到了元数据损坏。</p>
</dd>
<dt>描述</dt><dd><p>从元数据存储池读取时，遇到了元数据损坏或丢失的情况。这条消息表明损坏之处已经被妥善隔离了，以使 MDS 继续运作，如此一来，若有客户端访问损坏的子树就返回 IO 错误。关于损坏的细节信息可用 <code class="docutils literal notranslate"><span class="pre">damage</span> <span class="pre">ls</span></code> 管理套接字命令获取。只要一遇到受损元数据，此消息就会立即出现。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-health-read-only">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_HEALTH_READ_ONLY</span></code><a class="headerlink" href="#mds-health-read-only" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“MDS in read-only mode”
MDS 处于只读模式。</p>
</dd>
<dt>描述</dt><dd><p>MDS 已进入只读模式，任何尝试修改元数据的操作都会收到
EROFS 错误代码。在 MDS 写入元数据存储池时遇到写错误、或者管理员用 <em>force_readonly</em> 管理套接字命令强行设置时，
MDS 会进入只读模式。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-slow-request">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_SLOW_REQUEST</span></code><a class="headerlink" href="#mds-slow-request" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“<em>N</em> slow requests are blocked”
<em>N</em> 个慢请求被阻塞了。</p>
</dd>
<dt>描述</dt><dd><p>一或多个客户端请求没有及时完成，说明 MDS 要么跑得太慢、要么 RADOS 集群没及时确认日志写操作、或者软件有缺陷。
可用 <code class="docutils literal notranslate"><span class="pre">ops</span></code> 管理套接字命令罗列未完成的元数据操作。
如果有客户端请求花费的时间超过 <code class="docutils literal notranslate"><span class="pre">mds_op_complaint_time</span></code>
（默认为 30s ），此消息就会出现。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-cache-oversized">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_CACHE_OVERSIZED</span></code><a class="headerlink" href="#mds-cache-oversized" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Too many inodes in cache”
缓存里的 inode 过于多了。</p>
</dd>
<dt>描述</dt><dd><p>MDS 没能成功削减缓存，未能降到管理员设置的上限之下。如果 MDS 缓存涨得太大，守护进程可能会耗尽内存然后崩溃。默认情况下，如果实际的缓存尺寸（在内存里的）比<code class="docutils literal notranslate"><span class="pre">mds_cache_memory_limit</span></code> （默认为 4GB ）大至少 50% ，这个消息就会出现。更改 <code class="docutils literal notranslate"><span class="pre">mds_health_cache_threshold</span></code>
可设置超出的告警比率。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="fs-with-failed-mds">
<h3><code class="docutils literal notranslate"><span class="pre">FS_WITH_FAILED_MDS</span></code><a class="headerlink" href="#fs-with-failed-mds" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Some MDS ranks do not have standby replacements”
有些 MDS rank 没有备胎。</p>
</dd>
<dt>描述</dt><dd><p>通常，失败的 MDS 会被备用 MDS 取代，这个过程是瞬间的、不会被认为是致命的。
然而，如果没有可用的备用 MDS 来取代活跃的 MDS rank ，就会产生这条健康警告。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-insufficient-standby">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_INSUFFICIENT_STANDBY</span></code><a class="headerlink" href="#mds-insufficient-standby" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Insufficient number of available standby(-replay) MDS daemons than configured”
热备 MDS 守护进程不够用，少于配置的。</p>
</dd>
<dt>描述</dt><dd><p><code class="docutils literal notranslate"><span class="pre">standby_count_wanted</span></code> 配置变量可以指定热备 MDS 守护进程的最低数量。
当可用的热备 MDS 守护进程少于配置的数量时，就会产生这个健康警报。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="fs-degraded">
<h3><code class="docutils literal notranslate"><span class="pre">FS_DEGRADED</span></code><a class="headerlink" href="#fs-degraded" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Some MDS ranks have been marked failed or damaged”
一些 MDS rank 已经失败或损坏了。</p>
</dd>
<dt>描述</dt><dd><p>在一个或多个 MDS rank 由于不可恢复的错误停留在失败的或损坏的状态时出现。
当一个（或多个） rank 离线时，这个文件系统就可能部分或完全不可用。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-up-less-than-max">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_UP_LESS_THAN_MAX</span></code><a class="headerlink" href="#mds-up-less-than-max" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Number of active ranks are less than configured number of maximum MDSs”
活跃的 rank 数量少于配置的最大 MDS 数。</p>
</dd>
<dt>描述</dt><dd><p>MDS rank 的最大数量可以用 <code class="docutils literal notranslate"><span class="pre">max_mds</span></code> 配置变量来设置。
当 MDS rank 少于这个配置的值时，就会产生这个健康警报。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-all-down">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_ALL_DOWN</span></code><a class="headerlink" href="#mds-all-down" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“None of the MDS ranks are available (file system offline)”
没有可用的 MDS rank （文件系统离线）。</p>
</dd>
<dt>描述</dt><dd><p>所有 MDS rank 都不可用，导致文件系统完全离线。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-clients-laggy">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_CLIENTS_LAGGY</span></code><a class="headerlink" href="#mds-clients-laggy" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“Client <em>ID</em> is laggy; not evicted because some OSD(s) is/are laggy”
客户端 <em>ID</em> 滞后了，没驱逐是因为有些 OSD 滞后了。</p>
</dd>
<dt>描述</dt><dd><p>如果 OSD 滞后（由于某些条件，如网络割接等），那么它可能导致客户端滞后
（会话可能空闲或无法刷回脏数据以撤销能力）。
如果 <code class="docutils literal notranslate"><span class="pre">defer_client_eviction_on_laggy_osds</span></code> 设置为 true
（默认为 true ），就不会驱逐客户端，并因此产生这条健康警告。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-clients-broken-rootsquash">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_CLIENTS_BROKEN_ROOTSQUASH</span></code><a class="headerlink" href="#mds-clients-broken-rootsquash" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl>
<dt>消息</dt><dd><p>“X client(s) with broken root_squash implementation (MDS_CLIENTS_BROKEN_ROOTSQUASH)”
X 个客户端的 root_squash 实现有问题（ MDS_CLIENTS_BROKEN_ROOTSQUASH ）</p>
</dd>
<dt>描述</dt><dd><p>在 root_squash 中发现了一个缺陷，它可能会丢失带有 root_squash 能力的客户端所做的更改。要修正次问题需要更改协议，而且需要升级客户端。</p>
<p>这是一个 HEALTH_ERR 警告，因为存在不一致和丢失数据的危险。
建议同时升级客户端、在此期间停止使用 root_squash ，或者根据需要关闭警告。</p>
<p>要驱逐并永久阻止有问题的客户端连接到集群，
请设置 <code class="docutils literal notranslate"><span class="pre">required_client_feature</span></code> 位的 <code class="docutils literal notranslate"><span class="pre">client_mds_auth_caps</span></code> 。</p>
</dd>
</dl>
</div></blockquote>
</section>
<section id="mds-estimated-replay-time">
<h3><code class="docutils literal notranslate"><span class="pre">MDS_ESTIMATED_REPLAY_TIME</span></code><a class="headerlink" href="#mds-estimated-replay-time" title="Permalink to this heading"></a></h3>
<blockquote>
<div><dl class="simple">
<dt>消息</dt><dd><p>“HEALTH_WARN Replay: x% complete. Estimated time remaining <em>x</em> seconds”
HEALTH_WARN 重放： x% 已完成，预计剩余 <em>x</em> 秒。</p>
</dd>
<dt>描述</dt><dd><p>当一个 MDS 的日志重放耗时超过 30 秒时，此消息会显示预计完成时间。</p>
</dd>
</dl>
</div></blockquote>
</section>
</section>
</section>



<div id="support-the-ceph-foundation" class="admonition note">
  <p class="first admonition-title">Brought to you by the Ceph Foundation</p>
  <p class="last">The Ceph Documentation is a community resource funded and hosted by the non-profit <a href="https://ceph.io/en/foundation/">Ceph Foundation</a>. If you would like to support this and our other efforts, please consider <a href="https://ceph.io/en/foundation/join/">joining now</a>.</p>
</div>


           </div>
           
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="../quota/" class="btn btn-neutral float-left" title="CephFS 配额管理" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="../upgrading/" class="btn btn-neutral float-right" title="升级 MDS 集群" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2016, Ceph authors and contributors. Licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0).</p>
  </div>

   

</footer>
        </div>
      </div>

    </section>

  </div>
  

  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>

  
  
    
   

</body>
</html>